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Abstract 

We consider the problem of minimizing the sum of a convex function and a 
convex function composed with an injective linear mapping. For such problems, 
subject to a coercivity condition at fixed points of the corresponding Picard 
iteration, iterates of the alternating directions method of multipliers converge 
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show in particular that convex piecewise linear-quadratic functions naturally 
satisfy the requirements of the theory, guaranteeing eventual linear convergence 
of both the Douglas-Rac.hford algorithm and the alternating directions method 
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and denoising with multiresolution statistical constraints. 
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1 Introduction. 


The alternating directions method of multipliers (ADMM) has received a great deal 
of attention recently for large-scale problems involving constraints on the image of the 
unknowns under some linear mapping. The analysis has focused on either global com¬ 
plexity estimates [34] or sufficient conditions for local linear convergence [14,24,48]. 
The closely related Douglas-Rachford algorithm has also been the focus of recent stud¬ 
ies showing global complexity [42, 52] and (local linear) convergence in increasingly 
inhospitable settings [1,7-9,13,36,37,54], A survey of results on proximal methods 
in general can be found in [51]. In the convex setting, the convergence studies for 
both ADMM and Douglas-Rachford share a common thread through the well-known 
duality between these algorithms [29]. Studies of ADMM frequently invoke strong 
convexity. Studies of Douglas-Rachford, on the other hand have, until very recently, 
focused on feasibility problems and corresponding notions of regularity of intersec¬ 
tions. We combine an analysis of the ADMM algorithm with facts learned from the 
local convergence of Douglas-Rachford to provide sufficient conditions for local linear 
convergence of sequences generated by ADMM without strong convexity. While this 
paper was under review we became aware of two recent studies that also combine 
the analysis of ADMM and Douglas-Rachford to improve and generalize many local 
and global results [30,31]. While our theoretical results are general and abstract, our 
motivation for the current study comes from the application of statistical multiscale 
image denoising/deconvolution following [26,27] for fluorescence microscopic images 
(see also [2] for a review of fluorescence microscopy techniques and statistical meth¬ 
ods for them). We demonstrate the analysis for image denoising/deconvolution of 
Stimulated Emission Depletion (STED) images [35,38]. 

1.1 Notation and definitions 

Though many of the arguments presented here work equally well for infinite dimen¬ 
sional Hilbert spaces, to avoid technicalities, it will be assumed throughout that U 
and V are Euclidean spaces. The norm || • || denotes the Euclidean norm. We de¬ 
note the extended reals by (—oo, +oo] := 111 {+ 00 } and the nonnegative orthant 
by M + := {a; G M | a; > 0}. The closed unit ball centered at the origin is denoted 
by B. In the usual notation for the natural numbers N we include 0. The mapping 
A : U —y V is linear and the functional ,/:[/—>■ (— 00 , + 00 ] is proper (not everywhere 
Too and nowhere — 00 ), convex and lower semicontinuous (lsc), as is the functional 
H : V —> (— 00 , Too]. The level set of J corresponding to a 6 1 is defined by 
lev< a J {u G U : J(u) < a}. The domain of a function / : U —> (— 00 , Too] is 
defined by dom / = {«£[/: f{u) < 00 }. We use the notation <f> : U =4 V to denote 
a set-valued mapping from U to V. 

A proper function / : U —> (— 00 , Too] is strongly convex if there is a constant 
fx > 0 such that 

/ ((! - r)x Q T tx 1 ) < (1 - r)f(x 0 ) T t/(: x x ) - \ht{ 1 - r)||x 0 - aq|| 2 , (1.1) 

for all x 0 and X\ and r G (0,1). We will not assume smoothness of functions and so 
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will require the subdifferential. The subdifferential of a function f : U (— 00 , + 00 ] 
at a point x G dom / is defined by 

df(x) ■.= {v E U \ (v,x -x) < f{x) — f(x ), for all x G U} . (1.2) 

When x f dom / the sub differential is defined to be empty. Elements from the 
subdifferential are called subgradients. The sub differential of a proper, lsc convex 
function is a maximally monotone set-valued mapping [56, Theorem 12.17]. The 
Fenchel conjugate of a function / is denoted by f* and defined by 

f*(y) ■= sup {{y, x) - f(x)}. 

xeu 

A mapping <f> : V V is said to be /3-inverse strongly monotone [56, Corollary 12.55] 
if for all x,x' G V 

(v — v', x — x') > f3\\v — v'\\ 2 , whenever v G <h(x), v' G (1.3) 

The mapping <f> is said to be polyhedral (or piecewise polyhedral [56]) if its graph is 
the union of finitely many sets that are polyhedral convex in U x V [20]. We denote 
the resolvent of <L by := (Id+d*) 1 where Id denotes the identity mapping and the 
inverse is defined as 

<h _1 (y) ■■= {x G U \y G <3>(x)}. (1.4) 

The corresponding reflector is defined by R r ^ := 277 r/ $ — Id. 

Notions of continuity of set-valued mappings have been thoroughly developed over 
the last 40 years. Readers are referred to the monographs [4,20,56] for basic results. 
A mapping $ : U V is said to be Lipschitz continuous if it is closed valued and 
for all u, v! G U there exists a r > 0 such that 

$(«') C <h(u) + t||m' - u\\M. (1.5) 

Lipschitz continuity is, however, too strong a notion for set-valued mappings. A key 
property of set-valued mappings that we will rely on is metric subregularity , which 
can be understood as the property corresponding to a Lipschitz-like continuity of the 
inverse mapping relative to a specific point. As the name suggests, it is a weaker 
property than metric regularity which, in the case of an n x m matrix for instance, is 
equivalent to surjectivity. Our definition follows the characterization of this property 
given in [20, Exercise 3H.4]. 

Definition 1.1 ((strong) metric subregularity). 

(i) The mapping <f> : U V is called metrically subregular at x for y relative to 
W C U if (x , y) G gph $ and there is a constant c > 0 and neighborhoods O of 
x such that 


dist (x, $ l {y) O W) < cdist ( y , <L(x)) V x G O D W. (1.6) 
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(ii) The mapping $ is called strongly metrically subregular at x for y relative to 
W C U if (x, y) G gph $ and there is a constant c > 0 and neighborhoods O of 
x such that 

||x — x|| < cdist (y, $(x)) V x G O fl W. (1.7) 

The constant c measures the stability under perturbations of inclusion y G $(x). 

An important instance where metric subregularity comes for free is for polyhedral 
mappings. 

Proposition 1.2 (polyhedrality implies strong metric sub regularity). Let W C V 
be an affine subspace and T : W =4 W. If T is polyhedral and Fix T fl W is an 
isolated point, {x}, then Id— T : W =1 [W — x) is strongly metrically subregular, 
hence metrically subregular, at x for 0 relative to W. 

Proof If T is polyhedral, so is $ _1 := (Id—T) _1 . Now by [20, Propositions 31.1 and 
31.2], since 4A 1 is polyhedral and x is an isolated point of < h _1 (0)nl'P, then $ = Id — T 
is strongly metrically subregular at x for 0 with constant c on the neighborhood O of 
x restricted to W (1.7). □ 

One prevalent source of polyhedral mappings is the sub differential of piecewise linear- 
quadratic functions (see Proposition 2.6 below). 

Definition 1.3 (piecewise linear-quadratic functions). A function f : M n —» [—oo, +oo] 
is called piecewise linear-quadratic if domf can be represented as the union of finitely 
many polyhedral sets, relative to each of which f{x) is given by an expression of the 
form \ {x, Ax) + (a, x) +a for some scalar a G M vector a G and symmetric matrix 
A G R nxn . 

A notion related to metric regularity is that of weak-sharp solutions. This will be 
used in the development of error bounds (Theorem 3.4). 

Definition 1.4 (weak sharp minimum [16]). The solution set argmin {f(x) \ x G 0} 
for a nonempty closed convex set 0, is weakly sharp if, for p = inf'n f, there exists a 
positive number a (sharpness constant) such that 

f(x) >p + a dist (x, Sf ) VT G O. 

Similarly, the solution set Sf is weakly sharp of order v > 0 if there exists a positive 
number a (sharpness constant) such that, for each igU, 

f(x) >p + a dist (x, SfY Vx G Ll. 

1.2 Preparatory abstract results 

To conclude this section we present general results about types of (firmly) nonexpan- 
sive operators that clarify the underlying mechanisms yielding linear convergence of 
many algorithms. The operative definitions are given here. 
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Definition 1.5 ((.S', e)-(firmly-) nonexpansive mappings). Let D and S be nonempty 
subsets of U and let T be a (multi-valued) mapping from D to U. 

(i) T is called (S', e)-nonexpansive on D if 

\\x + -x + \\<y/T+t\\x-z\\, (1.8) 

Vx G D, Vx G S', Vx + G Tx, Vx + G Tx. 

If (1.8) holds with e = 0 then we say that T is S'-nonexpansive on D. 

(ii) T is called (S', c)-firmly nonexpansive on D if 

||x+ - x+|| 2 + ||(x - x + ) — (x — x+)|| 2 < (1 + e) ||x - x|| 2 , (1.9) 

Vx G D, Vx G S', Vx + G Tx, Vx + G Tx. 

If (1.9) holds with e = 0 then we say that T is S'-firmly nonexpansive on D. If, 
in addition, S = Fix T, then T is said to be quasi-firmly nonexpansive. 

Theorem 1.6 (abstract linear convergence result). Let W C V be an affine subspace 
and T : W =1 W be quasi-firmly nonexpansive on W. Let Fix T D W be an isolated 
point, {x}. If Id— T : W =1 {W — x) is metrically subregular at x for 0, then there 
is a neighborhood O of x such that 

dist (x + , Fix T) < yjl — n dist (x, Fix T), Vx + G Tx, Vx G O D W, (1.10) 

where 0 < k = c~ 2 for c a constant of metric subregularity of Id —T at x for the 
neighborhood O. Consequently, the fixed point iteration x k+1 = Tx k converges linearly 
to Fix T with rate yjl — k for all x° G O D W. 

Proof. Define $ := (Id—T) and note that {x} = (Id—T) _1 (0) {x} = Fix T, 

hence 

dist (x, (Id —T) _1 (0)) = dist (x, Fix T) = ||x — x||. 

Suppose that is metrically subregular at Fix T for 0. Then by Definition 1.1 (i) we 
have, for all x G O D W and for all x + G T(x), 

dist (x, (Id —T) _1 (0)) = ||x — x|| < cdist (0, (x — Tx)) < c||x — x + ||, (1.11) 

which is the coercivity condition of [36, Eq.(3.1), Lemma 3.1]. By assumption, T is 
(Fix T, 0)-firmly nonexpansive (i.e., quasi-firmly nonexpansive) on W (Definition 1.5 
(ii)). The result then follows from [36, Lemma 3.1] with rate \/l — k for k = c -2 . □ 

Remark 1.7 (on k). The constant k in the above theorem can always be chosen to be 
less than or equal to 1. To see this, note that for any metrically subregular mapping 
<L, there is a constant c > 1 and hence a k < 1 so that the rate constant given in 
Theorem 1.6 will always hold whenever the fixed point is a (relatively) isolated point. 
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Example 1.8 (a simple example). Consider two lines, A and B, in R 2 intersecting 
orthogonally at the origin and let T be the Douglas-Rachford operator for the projec¬ 
tions onto each line. In this example T = \ ( RaRb + Id) where Ra ‘IPa — Id for 
the projection onto the line A denoted by Pa, and likewise for Rb■ In the context of 
what follows, Pa is the resolvent of the subdifferential of the indicator function of the 
line A and likewise for P B . It is elementary to verify that T is firmly nonexpansive, 
has a unique fixed point, and T(x) = 0 for all x. Moreover $ = Id— T = Id, which 
has a constant of metric subregularity c — 1. Theorem 1.6 then predicts that the 
Douglas-Rachford algorithm converges linearly with rate constant 0 in this case, i.e. 
it converges in one step. The reader can verify that this indeed is the case. 

To see the importance of the restriction to the affine subspace W, consider instead 
of two lines in R 2 two lines in R 3 intersecting at the origin. It can be shown that the 
fixed points of the Douglas-Rachford operator consist of the axis - let’s call it the z 
axis - extending from the origin, perpendicular to the linear hull of the two lines [6]. 
It is elementary to verify that, from any starting point x° in R 3 , the Douglas-Rachford 
algorithm converges in one step to the intersection of the z axis with the affine subspace 
containing x° and parallel to the plane containing the lines A and B. Clearly, the fixed 
points of the mapping T are not isolated points, but they are isolated points relative 
to the affine subspace containing x° and parallel to A and B, so Theorem 1.6 applies 
and predicts, correctly, that the Douglas-Rachford algorithm converges to a fixed point 
in one step. The projection of this fixed point onto the set B is the solution to the 
problem of finding the point of intersection. 

Corollary 1.9 (Polyhedrality implies linear convergence). Let W C V be an affine 
subspace and T : W =1 W be quasi-firmly nonexpansive on W. Let Fix T fl W be an 
isolated point, {x}. IfT is polyhedral, then there is a neighborhood O of x such that 

dist (x+, Fix T) < y/l — ftdist (x, Fix T) Vx+ G Tx, Vx G O fl W, 

where 0 < k = c ~ 2 for c a constant of metric subregularity of Id — T at x for the 
neighborhood O fl W. Consequently, the fixed point iteration x k+1 = Tx k converges 
linearly to Fix T with rate y/l — k for all x° G O fl W. 

Proof. The result follows immediately from Proposition 1.2 and Theorem 1.6. □ 

The requirement that the fixed point set is a singleton can be viewed as a uniqueness 
assumption, which is common in the inverse problems literature. It is well known, 
however, that, even if the solution to a given problem is unique, the set of fixed 
points of the numerical method (of interest to us, the Douglas-Rachford operator) 
need not be solutions to the given problem, much less be unique [6,44]. Recent 
work has shown, however, that the set of fixed points need only consist of singletons 
relative to appropriate affine subspaces where the iterates lie [37,54], This feature has 
been exploited in the analysis of the Douglas-Rachford algorithm applied to problems 
with polyhedral and quadratic structure [43]. Metric (sub)regularity, on the other 
hand, is one of the central assumptions of well-posedness of inverse problems [20,39]. 
Other useful equivalent characterizations of metric subregularity can be found in [20] . 
Polyhedrality can be quite easy to verify, as we will see below. 
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2 Linear Convergence of Douglas—Rachford/ 
Alternating Directions Method of Multipliers 

We consider problems in the following format: 

minimize J(u) + H(Au). (V) 

u£U 

There are many possibilities for solving such problems. We focus our attention on 
one of the more prevalent methods, the alternating direction method of multipliers, 
abbreviated as ADMM (primary sources include [22,23,29,32,55]). This method 
is one of many splitting methods which are the principle approach to handling the 
computational burden of large-scale, separable problems [15]. ADMM belongs to a 
class of augmented Lagrangian methods whose original motivation was to regularize 
Lagrangian formulations of constrained optimization problems. 

Introducing a new variable v G V, our problem is to solve 

minimize J(u) + H(v), subject to Au = v. (2.1) 

(u,v)&U xV 


The augmented Lagrangian L for (2.1) is given by 

L(u , v, b ) — J(u) + H(y) + ( b , Au — v) + ^||Au — u|| 2 , (2.2) 


where b G V, g > 0 is a fixed penalty parameter. The ADMM algorithm for solving 
(2.1) is, given ( u k ,v k ,b k ), fcGN, compute (u k+1 , v k+1 , b k+1 ) by 


u k+1 G argmin u {J(u) + |||Au — v k + g 1 b k | 2 } ; 

(2.3) 

v k+1 G argmin^ { H(v) + \Au k+1 — v + ^ _1 ^ fc || 2 } ; 

(2.4) 

b k+ 1 = b k + g(Au k+1 -v k+1 ). 

(2.5) 

Using 2 |\Au — v + 77 -1 & fc || 2 — \b k \ 2 = ( b k ,Au — v) + |||Au — v|| 2 , 
(2.3)-(2.5) can be written equivalently as 

the algorithm 

Algorithm 2.1 (ADMM). 

Initialization. Choose g > 0 and (v°,b°) G U x V x V. 

General Step (h — 0, 1,...) 


u k+l G argmin u {J(u) + (b k , Au) + ^|| Au — v k \\ 2 } ; 

(2.6a) 

v k+1 G argmin v {H(v) — ( b k ,v) + ^\\Au k+1 — n|| 2 } ; 

(2.6b) 

b k +i = + r] (A u k+ 1 _ v k+1 ). 

(2.6c) 


The penalty parameter g need not be a constant, and indeed evidence indicates that 
the choice of g can greatly impact the complexity of the algorithm, but this is beyond 
the scope of this investigation, so we have left this parameter fixed. 

We do not specify how the argmin in steps (2.6a)-(2.6b) should be calculated, and 
indeed, the analysis that follows assumes that these can be computed exactly. This 
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is, of course, not true in practice. In an attempt to circumvent this fact, the standard 
approach in numerical analysis is to accommodate summable errors. The general¬ 
ization to summable errors is, however, tantamount to eventual exact evaluation of 
(2.6a)-(2.6b) and thus, for all practical purposes, is no different from immediate exact 
evaluation, the latter involving errors that sum to zero. 

Even if we do assume infinite precision, a few remarks about the computational 
complexity of the individual steps of Algorithm 2.1 are warranted. Inspection of (2.6a) 
shows that an implicit method involving computation of the inverse of A T A may not 
be feasible if this is very large or does not otherwise enjoy a structure that allows 
for efficient inversion. If J is smooth, a number of classical quasi-Newton methods, 
with error bounds, are available [49]. If J is nonsmooth, then a forward-backward- 
type method such as FISTA [10] could be applied. In the latter case new results 
on convergence of the iterates to a solution open the door to error bounds at this 
stage [3]. The second step (2.6b) does not involve any matrix inversion, but will, for 
exact penalization, involve a nonsmooth penalty H. Again, one has recourse to fast 
first-order methods that, as of very recently, permit error bounds. 

Our goal is to determine the rate of convergence of these algorithms so that they 
may be used as inner routines in an iteratively regularized procedure. Knowing that 
an algorithm converges linearly, for instance, yields rational stopping criteria with 
computable estimates for the distance of the current iterate to the solution set. 

We present sufficient conditions for linear convergence of Algorithm 2.1 by showing 
the same for the Douglas-Rachford algorithm which is more amenable to the tools of 
abstract fixed point theory presented in Section 1.2. It is well known [22, 29] that 
the ADMM algorithm can be derived from the Douglas-Rachford algorithm, and vice 
versa, and therefore sufficient conditions for convergence of Douglas-Rachford also 
apply here. The first convergence result for Douglas-Rachford is due to Lions and 
Merrier [44] , under the assumption of strong convexity and Lipschitz continuity of J. 
Recent published work in this direction includes [30,31,34], Convergence rates with 
respect to objective values under various assumptions on the objective, all of which 
involving strong convexity, was established in [34, 50] which is conservative. Local 
linear convergence of the iterates to a solution was established in [14] for linear and 
quadratic programs using spectral analysis. In the first main result, Theorem 2.3, we 
describe two conditions that guarantee linear convergence of the ADMM iterates to 
a solution. The first of these conditions follows from classical results of Lions and 
Merrier [44], The second condition is based on work of more recent vintage [36], is 
much more prevalent in applications and generalizes the results of [14], 

The (Fenchel-Legendre) dual problem corresponding to the problem (V) is (see, 
for instance [12]) 

min J*(A T w ) + H*(—w ). 

w£V 

Here J* and H* are the Fenchel conjugates of J and H respectively. Instead of working 
with this dual, we work with the following equivalent form with the change of variable 

min J*(—A t v) + H*(v). 

8 


v = — w: 


m 


Under the assumption that the solutions u and b of the primal and dual problems exist 
and that the dual gap is zero, the following two inclusions characterize the solutions 
of the problems (V') and ( T > ') respectively: 

0 G dJ{u) + d(H o A)(u)-, 

0 G <9 (J* o (- A t )) (b) + 6H*(b). 

In both cases, one has to solve an inclusion of the form 

0 e(B + D)(x), (2.7) 

for general set-valued mappings B and D. For any 77 > 0, the Douglas-Rachford 
algorithm [21,44] for solving (2.7) is given by 

b k+i eT ’ b k (keN), (2.8) 

for V := J qD (J v b( Id — rjD) + rjD) , (2.9) 

where J v d and J T] b are the resolvents of r/D and r/B respectively. The connection 
between the ADMM algorithm (2.6a)-(2.6c) and the Douglas-Rachford algorithm 
(2.8) was first discovered by Gabay [29] and is derived for convenience in the Appendix. 

Given b° and v° G Db°, following [57], define the new variable x G := b°-\-gv° so that 
b° = J q i)X {) . We thus arrive at an alternative formulation of the Douglas-Rachford 
algorithm (2.8): 

x k+i eTx k (2.10) 

for T := |( R^bR^d + Id) = — Id) + (Id —J^d), (2-11) 

where R^b and R v b are the reflectors of the respective resolvents. This is exactly the 
form of Douglas-Rachford considered in [44] . 

Remark 2.1 (proximal mappings of convex functions). Note that for our application 

B:=d (J* o (—A T )) and D := dH*, (2.12) 

and so the resolvent mappings are the proximal mappings of the convex functions 
( J* o (— A T f) and H* respectively, and hence the resolvent mappings and correspond¬ 
ing fixed point operator T are single-valued [f 7], 

Proposition 2.2. Let J : U gIU {+ 00 } and H : V —>■ M be proper, Isc and convex. 
Let A : U —> V be linear and suppose there exists a solution to 0 G (B + D)(x) 
for B and D defined by (2.12). For fixed g > 0, given any initial points x° and 
(b°,v°) G gphD such that x° = b° + gv°, the sequences (6 fc ) fcgN , (x k ^j and (^ fc ) fcgN 
defined respectively by (2.8), (2.10) andv k := ^ (x k — b k ^j converge to points b G Fix T', 

x G Fix T and v G ^(FixT 7 ). The point b = J v dx is a solution to {V), and 
v = ^ (x — b ) G Db. If, in addition, A has full column rank, then the sequence 
{b k ,v k ) keN corresponds exactly to the sequence of points generated in steps (2.6b) and 
(2.6c) of Algorithm 2.1 and the sequence (w fc+1 ) fcgN generated by (2.6a) converges to 
u, a solution to {V). 
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Proof. Following [22,57], we rewrite the Douglas-Rachford iteration 2.8 in two steps: 
Given (b°,v°) E gpliD, for k E N do 

find (q k+1 ,s k+1 ) E gph(R) such that q k+1 + gs k+1 = b k — qv k ] (2.13a) 
find [b k+1 , v k+1 ) E gph(H) such that b k+1 + gv k+l = q k+l + rjv k . (2.13b) 

The existence and uniqueness in the above steps follows from the representation lemma 
[22, Corollary 3.6.3]. The mappings B,D are maximal monotone operators as the 
subdifferentials of proper lsc convex functions. This together with the fact that the 
solution set of (2.7) is non-empty yields that the sequence (b k ,v k )km defined by the 
algorithm (2.13) converges to some (b,v) such that v E Db and b solves ifD') [57, 
Theorem 1], By the change of variables x k = b k + rjv k , it follows that x k —» x E Fix T 
for T given by (2.11). 

For these definitions of B and D, the sequence (& fc ) fcgN generated by b k := J V D% k 
for x k generated by (2.10) corresponds exactly to the sequence (b k ) keN generated by 
(2.8). Moreover, if A is full column rank, then by the discussion in [22] (see the 
Appendix) both {b k ) keN and the sequence (v k ) keN generated by v k := 3 [x k — 6 fc ) E 
Db k correspond exactly to the sequences of points b k and v k generated by (2.6a)- 
(2.6c). Consequently, by [22, Proposition 3.42] 1 the sequence (w fe ) fcgN defined by 
(2.6a) converges to a solution of {V'). □ 

We now state sufficient conditions guaranteeing linear convergence of the ADMM 
and the Douglas-Rachford algorithms. The first conditions (i) of Theorem 2.3 are 
classical. The second conditions are new. 

Theorem 2.3 (local linear convergence I). Let J : U ->KU {+oo} and H : V —* M 
be proper, lsc and convex. Suppose there exists a solution to 0 E (B + D){x) for 
B and D defined by (2.12) where A : U —>• V is an injective linear mapping. Let 
x E Fix T for T defined by (2.11). For fixed r) > 0 and any given triplet of points 
( b°,v°,x °) satisfying x° := b° + rjv 0 , with v° E Db°, generate the sequence ( v k ,b k )km 
by (2.6a)-(2.6c) and the sequence (x k )km by (2.10). 

(%) Let O C U be a neighborhood of x on which H is strongly convex with con¬ 
stant /i and dH is (3-inverse strongly monotone for some (3 > 0. Then, for 
any ( b°,v°,x° ) E O satisfying x° := b° + rjv° E O, the sequences ( x k )km an d 
(v k ,b k )km converge linearly to the respective points x E Fix T and (b,v) with 

rate at least K = (1 — ^^ 2)2 < 1. 

(ii) Suppose that T : W —» W for some affine subspace W C U with x E W. On the 
neighborhood O of x relative to W, that is O D W, suppose there is a constant 
k > 0 such that 

\\x — x + || > v^dist (x, Fix T) Wx E O fl W, Vx + E Tx. (2.14) 

Then the sequences (x k )k^n and ( v k ,b k ) k£N converge linearly to the respective 
points x E Fix T fl W and (b, v) with rate bounded above by \J 1 — k. 

1 By convergence of v k —> v and b k —> b and the update rule (2.6c), Au k —> v, from which the 
claim follows - see the Appendix. 
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In either case, the limit point b = J v d% is a solution to (V), v G Db and the sequence 
(u k ) keN given by (2.6a) of Algorithm 2.1 converges to u, a solution of {V'). 

Proof. The final statement of the theorem and the statements about the sequence 
(b k ,v k ) follows from Proposition 2.2 where it is shown that the sequence (' v k , b k ) km 
generated by (2.6a)-(2.6c) corresponds to sequences (b k ) keN and (n fc ) fcgN generated 
respectively by (2.8) and v k = 4 ( x k — b k ) G Db k for (x k ) fcgK generated by (2.10). 
The linear convergence of the iterates of Algorithm 2.1 claimed in statements (i) and 
(ii) follows from the properties of the operators T' and T defined respectively by (2.9) 
and (2.11). 

Part (i). Since H is assumed to be strongly convex with fi > 0 the modulus of 
convexity on O, dH is strongly monotone with modulus of monotonicity /j [5, Example 
22.3]. Since dH is also maximally monotone, using the identity dH = ( dH *)~ 1 (see, 
for example, [53, Corollary 3.49]) we conclude that dH* is Lipschitz continuous with 
constant 4. Moreover, since dH is /5-inverse strongly monotone on O, we have for 
any x,y G O 

(u — v,x — y) >/3\\u — v\\ 2 , whenever u G dH(x),v G dH(y). 

Hence dH* is strongly monotone with modulus /5 and Proposition 4 of [44] applies to 
yield linear convergence of the sequences (x fc ) and (b k ) to the respective limit points 
x and b 

\\x k - x|| < LK k ; || b k - 6|| < LK k , (2.15) 

where L is some constant, K = (1 — ^+^ 2 )2 and f = 4 is the Lipschitz constant for 
the set-valued map dH* on O. Now, since v k = 4( x k — b k ), we have for v k —> v : = 
4 (x — b) with the same rate as x k and b k , modulo a constant: 

or ixk 

\\v k - v\\ < 4 (\\x k - x|| + ||6 - 6 fc ||) <-. (2.16) 

^ ^ T) 

This completes the proof of the first statement. A 

Part (ii). Since B and D are maximal monotone operators the reflected resolvents 
R v b and R ri n are nonexpansive [5, Proposition 23.7]. The composition R^bR^d is 
nonexpansive which implies that the mapping T is firmly nonexpansive [5, Proposition 
4.2], and hence quasi-firmly nonexpansive on W. Condition (2.14) is the coercivity 
condition (b) of [36, Lemma 3.1] which guarantees local linear convergence of fixed- 
point iterations for (S, e)-firmly nonexpansive mappings (S C Fix T D W). Quasi- 
firmly nonexpansive mappings, under consideration here, are (Fix T D W, 0)-firmly 
nonexpansive. Thus, by [36, Lemma 3.1] the sequence ( x k ) k eN converges linearly on 
the neighborhood O with rate a/1 — k. Nonexpansiveness of the resolvent J V £> and 
the relations b k = J^D% k and v k = 4 ([ x k — 5 fc ) then complete the proof of the second 
statement. □ 

Remark 2.4. The strong convexity assumption (i) of Theorem 2.3 fails in a wide 
range of applications, and in particular for feasibility problems (minimizing the sum 
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of indicator functions). By Theorem 1.6, case (ii) of Theorem 2.3, in contrast, holds 
in general for mappings T for which Id — T is metrically subregular and the fixed 
point sets are isolated points with respect to an affine subspace to which the iterates 
are confined. The restriction to the affine subspace W is a natural generalization 
for the Douglas-Rachford algorithm, where the iterates are known to stay confined to 
affine subspaces orthogonal to the fixed point set [37,54]- It would be far too restrictive 
to require that Fix T be a singleton on the entire ambient space V rather than with 
respect to just the affine hull of the iterates. We show that metric subregularity with 
respect to this affine subspace holds in many applications. (See also Example 1.8.) 

Remark 2.5. Proposition 2.2 and Theorem 2.3 and their proofs also hold in infinite 
dimensional Hilbert spaces. Lemma 3.1 of [36] is stated for Euclidean spaces, but the 
proof holds also on general Hilbert spaces. 

Proposition 2.6 (polyhedrality of the Douglas-Rachford operator). Let J : U —>■ 
R U {+oo} and H : V —> R be proper, Isc and convex. Suppose, in addition, that J 
and H are piecewise linear-quadratic. The operator T : V —* V defined by (2.11) 
with p > 0 fixed, is polyhedral for B and D given by (2.12) where A : U —>■ V is a 
linear mapping. 

Proof. Since the functions J and H are proper, lsc, convex and piecewise linear- 
quadratic, by [56, Theorem 11.14] so are the Fenchel conjugates, J* and H*. The 
subdifferentials B := d (J* o (— A T )) and D := dH* and their resolvents, therefore, 
are polyhedral mappings [56, Proposition 12.30]. Since the graphs of reflectors R^b 
and R v b correspond to the graphs of their respective resolvents J v b and J v b through 
a linear transformation, R v b and R, v o are also polyhedral mappings. Since by Remark 
2.1 the resolvents J v b and J n D are single-valued, the reflectors R v b and R, n n are also 
single-valued. Therefore T = ^(R^bR^d + I ) is polyhedral as the composition of 
single-valued polyhedral mappings. □ 

Theorem 2.7 (local linear convergence II). Let J : U —» R U {+oo} and H : V —>■ R 
be proper, lsc, convex, piecewise linear-quadratic functions (see Definition 1.3). Define 
the operator T : V —>■ V by (2.11) with r) > 0 fixed and B and D given by (2.12) 
where A : U —$■ V is a linear mapping. Suppose that there exists a solution to 
0 G (B + D)(x), that T : W —> W for W some affine subspace of V and that 
Fix T fl W is an isolated point {x}. Then there is a neighborhood O of x such that, 
for all starting points (x°, v°, b °) with x° := b° + pv 0 G O fl W for v° G D(b°) so that 
Jr]DX 0 = b°, the sequence (x k )ken generated by (2.10) converges linearly to x where 
b := Jnnx is a solution to i[D'). The rate of linear convergence is bounded above by 
y/1 — k, where k = c~ 2 > 0, for c a constant of metric subregularity of Id — T at x for 
the neighborhood O. Moreover, the sequence (b k ,v k ) keN generated by Algorithm 2.1 
converges linearly to (b,v) with v — I(x — b), and the sequence (w fc ) fcgN defined by 
(2.6a) of Algorithm 2.1 converges to a solution to i[P'). 

Proof. By Proposition 2.6 the Douglas-Rachford operator T is polyhedral and thus 
the first statement follows from Corollary 1.9. The statement about the sequences 
generated by Algorithm 2.1 follows as in Theorem 2.3. □ 
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3 Error Bounds and Iterative Penalization 


In this section, we study an iteratively regularized algorithmic scheme for solving the 
problems of the form 

min { J(u) | u e U and fj(Au) < €j, j — 1,2,..., M} , 

where ,/:!/—> (—oo, +oo] is proper lsc and convex, the mapping A: U —» V is linear, 
for all j the nonnegative-valued function fj : V —> M + is convex and smooth (at least 
at points that matter) and e 3 > 0. We refer to the inequality constraints as structured 
constraints. It will be convenient to introduce the following notation that will help 
to reduce clutter. We collect the constraints into a vector-valued function so that we 
can write the problem as 


minimize 

J(u) 

(V) 

ueu 

subject to 

F e (Au ) < 0, 

where 

F e : V ->■ R m := v ^ (fi(v) - 

■ ei, f 2 (v) - e 2 ,.. ■ , /m(v) - e M ) • 

(3.1) 

Here the vector inequality is understood 

as holding element-wise. 



A common approach to solving problems of the type (V) arising from inverse 
problems is to apply implicitly the structured constraint by adding some (usually 
smooth) quantification of the constraint violation into the objective function: 


minimize J(u) + p9(F e (Au)), (V p ) 

u&U 

where 9 : M m — > (—oo, +oo] is a proper, lsc convex function and p > 0. This places 
us in the context of the previous section since problem (Vp) is the specialization of 
(V) with H(Au) = p9(F t (Au)). 

As is often seen in the inverse problems literature, the constraint violation pa¬ 
rameter €j — 0 (j — 1,... ,M), essentially penalizing divergence from the origin. A 
prominent instance of this form of regularization is the squared norm: 9(v) := ||u|| 2 . 
There are many efficient methods available for solving (V p ). It is clear that for a 
certain value of p the optimal solution to ( V p ), u p , will satisfy fj(Au p ) < 1j(p) with 
the effective error ej(p) depending on p. What is not true in general, however, is that 
the solution to (V p ) corresponds to the solution to (V) for the constraint error e(p). 
Moreover, for our intended applications, U is a finite dimensional Euclidean space 
with dimension n and the dimensionality of the constraints M grows superlinearly as 
a function of n , so we would like to consolidate the constraints somehow while ex¬ 
ploiting the phenomenon that, at the solution to (V) relatively few of the constraints 
are in fact tight or active. 

We consider convex penalties that reduce the dimensionality of the constraint 
structure and have the property that 9(F e (Au )) = 0 if and only if F e (Au) < 0. Of 
particular interest among penalties with this property are exact penalties, that is 
penalties 9 with the property that solutions to (V p ) correspond to solutions to (V) for 
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all values of p beyond a certain threshold p. For more background on exact penaliza¬ 
tion see, for example, [11,17,19,25,33,46]. We point also to Friedlander and Tseng [28] 
for a connection between exact penalization and what they call exact regularization as 
this fits well with our viewpoint that the structured constraints F e (Au ) < 0 constitute 
a regularization of the model with regularization parameter e. This illustrates the dis¬ 
tinction between model-based regularization, that is, regularization of the constraints 
motivated by external (eg. statistical) considerations, versus numerical regulariza¬ 
tion motivated solely on the grounds of enabling efficient (approximate) numerical 
solutions to (V). 

While it is nice to know that, with exact penalization, one can achieve an exact cor¬ 
respondence between the original constrained optimization problem and the penalized 
problem, the whole point of relaxing the constraints is to reduce the computational 
burden of strictly enforcing the constraints. As is often done in practice, one gradually 
strengthens the constraints, finding intermediate points that nearly solve the relaxed 
problem and using these as starting points for solving a more strictly penalized prob¬ 
lem. Together with Theorem 3.4 below, the linear convergence rate established in 
Theorems 2.3 and 2.7 of the previous section yield estimates on the distance of inter¬ 
mediate points to the solution set of the relaxed problem as well as estimates on the 
distance to feasibility for the unrelaxed problem. 

3.1 Structured Constraints and penalization 

Define 

C := {u eU \ F e (Au ) < 0} . (3.2) 

This is a closed convex set since the fj are lsc and convex. If there exists some 
a e M. such that C fl lev < a J is nonempty and bounded then ( V ) has a solution [5, 
Theorem 11.9]. This will happen, for instance, if dom (J) DC ^ 0 and J is coercive [5, 
Proposition 11.12], that is J satisfies 

lim J(u) = Too. (3.3) 

||ti ||—yoo 

Such assumptions are naturally satisfied in many applications. Moreover, lev < s J(w), 
the lower level-set of J corresponding to the optimal value a in (V), is convex and 
so the set of optimal solutions to (V) is also convex. Define J p := J T p9(F e o A) 
for the convex, lsc function 9 satisfying 6{w) > 0 for all w and 6{w) = 0 if and only 
if F e (w) < 0. Then J p is convex, lsc and corresponds exactly to J on the set C. 
Otherwise J p increases pointwise to Too at points outside C as p —* oo. For (/r) fcgN 
with p k —>■ oo, the sequence of functions (J Pk ) epi-converges (see [56, Definition 7.1]) 
to J T ic as k —» Too where lq is the indicator function of the set C. As we will 
allow approximate solution of problems (V p ) it will be helpful to recall the set of 
7 -minimizers: 7 — argmin J p := {u \ J p (u ) < inf J p T 7 }. The relation between the 
solution sets to (V) and {V p ) is detailed in the following, which is a direct application 
of [56, Theorem 7.33]. 
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Proposition 3.1. Let J : U ->• (-oo,+oo], F e : V ->■ R M and 9 : M m ->• R 
be proper, Isc and convex, and let A : U —> V be linear. Let J be coercive with 
dom J fl C 7 ^ 0 for C defined by (3.2). Suppose further that 9(w ) > 0 and that 
9(w) — 0 if and only if F e (w) < 0. Define J Pk := J + pk9(F e o A) where pk +oo 
as k /* +oo. Then inf J Pk —>■ inf J + tc < +oo. Moreover, for any sequence of 
errors 7 *, \ 0 and corresponding points u k G 7 ^ — argmin J Pk , the sequence (u k ) k€N 
is bounded, and all its cluster points belong to argmin {J + lq}. 

Proof sketch. The property of the convex penalty 6 that 9{w) > 0 and 9{w) = 0 
if and only if F e (w) < 0 yields epi-convergence of J Pk to J + ic■ Coercivity of J 
guarantees that J p is level bounded for all values of p > 0. These two properties, 
together with lower semicontinuity and the fact that J and J p are proper, are all that 
is needed to prove the result. □ 

If the regularization were exact , then we would know that for all parameter values 
p large enough, the solutions to ifP p ) coincide with solutions to ifP). We return to 
this later. 

3.2 Solution to the regularized Subproblem and error bounds 

We now turn our attention to solution of the problem (V p ) for a fixed value of pk. 
The AD MM algorithm discussed in Section 2 is useful for solving this problem in the 
sense that it has an error bound under specific assumptions which gives a stopping 
rule. This is not unique to Algorithm 2.1, but we focus on this method due to its 
prevalence in practice. 

Recall the exact problem (V)\ 

minimize J(u) 

ueu Pp\ 

subject to F e (Au ) < 0. 

It will be convenient to rewrite the penalized problem 2 (fP p ) as 

minimize - J (u) + 9 (FJAu)). (V p ) 

u£U p 

Consider also the limiting problem 

minimize 6(F e (Au)). ('Poo) 

u£U 

We view problem (fP p ) as the regularized version of (Poo) with J as the regularizing 
functional and ^ as the regularization parameter. Denote the solution sets to these 
problems by 

S := argmin { J(u) \ u G U, F e (Au) < 0} , 

S p := argmin |-J(u) + 9 ( F e {Au )) | u G U \ , 

Soo := argmin {9 ( F e (Au )) | u G U } . 

2 Of course, the value of the problem is not the same, but the solutions are. 
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If the penalization 9 satisfies 9(F e (Au)) = 0 if and only if F e (Au) < 0, then it 
is immediately clear that SA corresponds to the feasible set of problem (V) hence 
S C Soo. What is more remarkable is that, if a Lagrange multiplier for (V) exists, 
then S p = S for all p large enough, that is, the penalty 9 is exact. 

Theorem 3.2 (Theorem 4.2 of [28]). Suppose that S is nonempty and compact, and 
that there exist Lagrange multipliers A for {V). Let the penalization 6 in (fP p ) be 
convex. Assume, moreover, that 9 satisfies the condition 9(F e (Au)) = 0 if and only 
if F e (Au) < 0. Then the solution set to the penalized problem, S p , coincides with the 
solution set to the exact problem, S, for all p > 9°(A) where 9° is the polar function 
of 9 given by 9°( A) = sup ^ 0 

It is easy to check whether a solution u p G S p is in fact feasible for (V) (and 
hence also in S) by simply evaluating the value of 9 ( F q (Au p )). More generally, one 
would check whether the first order optimality conditions for (Poo) are satisfied at u p , 
namely 

0 G 89 (F e o A(-)) at u p . (3.4) 

An explicit formula for the subdifferential in (3.4) for image denoising and deconvolu¬ 
tion is given in Section 4 as this will be needed for computing Step (2.6b) of Algorithm 
2 . 1 . 

If, in addition, is weakly sharp (see Definition 1.4), then one can obtain an 
upper bound for the distance of solutions to (P p ) to feasible solutions to (V), even in 
the absence of Lagrange multipliers for (P). 

Assumption 3.3. 

(i) The solution set SA of problem (Poo) is nonempty. 


(ii) lcv< a J is bounded for each a G M. and inf xe[ / > — oo. 


(in) The solution set S^ of (Poo) is weakly sharp of order v > 1 . 

Theorem 3.4. Suppose Assumption 3.3(i)-(ii) hold. 

(i) For any p > 0, [J p >pS P is bounded. 

(ii) If, in addition, Assumption 3.3( iii) holds with modulus of sharpness u, then for 
any p > 0 there exists r > 0 such that 


dlSt{u p , Styfj S: : G S p , p 'A P' 


(3.5) 


(m) If, in addition, Assumption 3.3( iii) holds and the penalization 9 is exact, then 
for all p large enough, u p E S and dist{u p , S^) = dist(u p , S) = 0 . 

Proof, (i) and (ii). Under the assumption 3.3, Theorem 5.1 in [28] directly applies to 
yield the result. A 

(iii). If the penalization 9 is exact, then 9(F e (Au)) = 0 if and only if F e (Au) < 0, 
hence S = S p for all p large enough, and SA corresponds exactly to the feasible set 
in (P). □ 
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Remark 3.5. The error bound (3.5) holds independent of the existence of Lagrange 
multipliers for ( V ), hence, for exact penalization under Assumption 3.3, Theorem 3.4 
yields an upper bound on the distance of solutions to (fP p ) to feasible points for ( V ). 

4 Application: image deconvolution and denoising 
with statistical multiscale analysis 

We specialize the above results to the application of optimization with statistical 
multiscale side constraints. All of the examples considered in this section satisfy the 
requirements of Theorem 2.7, and thus for each fixed value of the penalty parameter 
p local linear convergence to a solution of (fP p ) is guaranteed. Moreover, the penalty 
function 6 that we use is exact and hence by Theorem 3.2, for p large enough, the 
computed solution to (V p ) is also a solution to (V). What is not known a priori is 
what value of p yields the correspondence. Moreover, since the whole point of the 
relaxation (fP p ) is to remove the burden of satisfying the constraints, we approach a 
solution to (V) via a sequence of solutions to (fP p ) for progressively larger values of p. 
This is described precisely in the following sequentially penalized algorithm. 


Algorithm 4.1 (Exactly Penalized Sequential ADMM). 

Initialization. Given an image y, a sequence of error tolerances (7fc)fceN 
with 0 < 7 ^ — > 0 Choose parameters: (3 > 1 and the penalty parameter 
7 G (0,2). Initialize k = i = 0, ^ 0,0) = 0,u° = y, ■u < '°' 0) = A T y, and compute 
u (o,i) _ ar g m in u | J(-u) + (&(°’°), Au) + %\\Au — rd 0 ’ 0) || 2 + \\\u — u^ 0 ’°)|| 2 }. 

For k — 0,1, 2,... 

• While 

— Compute (if k ’ l+l f 5 ( fc >*+ 1 )) via Algorithm 2.1 steps (2.6b)-(2.6c) with H := 
p k 9 ( F q (•)) for the exact penalty 6 and structured constraints F q . 

— Increment i — i + 1 and calculate ■u ( A* +1 ) via Algorithm 2.1 step (2.6a). 

• Update/reset: Set u (k+1 L) ;= u ( k , l +h an( i p k+1 — fip k ' Set k — k + 1 and 
i = 0. Ifd(F q (u^)) =0, set 7 fc = 0. 

The outer iteration, indexed by k , consists of numerical approximations to solutions of 
(V p ) for the penalty parameter p/,. The the inner iteration proceeds with the current 
value of p k until the step size between successive iterates u k,l+1 and u k ’ 1 drops in a 
linear fashion below a given tolerance 7 ^. From Theorem 2.7 one can then obtain a 
posteriori estimates on the distance of the iterate u k+1, i to the true solution. Then 
p k is increased by a constant factor. Since, for this model the penalization 6 is exact, 
once the constraints appear to be satisfied (as determined by monitoring the value of 
9 (F q ( v k ))), it is reasonable to conclude that the correspondence between problems 
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(Vp) and {V) holds, and the penalty pk no longer needs to be updated; the inner loop 
of the algorithm then can be run to the desired accuracy. As indicated in Figures 
1(b) and 4, the constraints appear to be satisfied when the penalty term p^O ( F q ( v k )) 
(green plot) drops suddenly to machine precision. 

The application problem involves image deconvolution and denoising with statis¬ 
tical multiscale estimation as presented in [2,26,27]. We are well aware that there are 
many ways to model such problems that permit much less computationally intensive 
numerical solutions than the technique we present here. Our interest in multiresolu¬ 
tion deconvolution/denoising is two-fold: first, it is one of the few techniques available 
that has the potential to yield quantitative (i.e. statistical) guarantees for the recov¬ 
ered images, and secondly, it is an important instance of convex optimization problems 
where the number of constraints grows superlinearly as a function of the number of 
unknowns. Our numerical demonstration addresses the first issue of quantitative im¬ 
age denoising: if the numerics do not permit estimates for the distance to the model 
solution, then the quantitative assurances of the model are irrelevant. Unlike the nu¬ 
merical approach proposed in [26,27], the numerical approach we present here permits 
error bounds to within machine accuracy of our numerical solution to the true model 
solution. 

Following the approach proposed in [26] we quantify the difference between an 
estimate v = Au and the data y via the maximum absolute value of all weighted inner 
products of the residual function A(■;?/): R n —)■ M n : 

fj(v) := \(ujj,A(v;y))\, j e {1, 2 ,..., M}. (4.1) 

The residual function used in [26] A is simply v — y. The weights u>j are scaled 
window functions so that the set X C {1,2,... , M} is the index set corresponding 
to all collections of these subsets of the image. The statistical multiscale analysis 
requires that, on each window, 


niax{ fj(v)} < q. (4.2) 

jex 

The same error q is specified at all scales. Hence F e in (3.1) specializes to 

F q : M n -)■ M a/+1 := v ■-> (fi(v) - q, f 2 (v) - q, ..., f M (v) - q, 0) T , (4.3) 

for fj : M n —» M defined by (4.1) (j — 1,, M ) and 

9 : M a/+1 —> M : 6(w) := max{tri, W 2 , ■ ■ ■ ,Wm+i}- (4.4) 

(Here we are expanding the original F e by the constant function f M +i{v) := 0.) 
The max function is a standard tool in exact penalization methods [17,19] and falls 
naturally into the context of piecewise linear-quadratic functions. 

Algorithm 4.1 does not specify how the iterates and are calculated. The 
linear convergence of the inner iterations predicted in Theorem 2.7, from which error 
bounds can be determined, as well as the numerical convergence of the outer iterates 
to problem ( V ) is discussed next. 
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4.1 Prox-evalutation 


Computation of u( k,l+1 ^ and v^ k ’ l+1 l in Algorithm 4.1 involves minimizing the sum of 
a convex quadratic function and (in general) a convex, nonsmooth, piecewise linear- 
quadratic function. This can be solved via any number of techniques ranging from 
first order methods like F1STA [10] to higher-order nonlinear optimization methods 
like quasi-Newton methods studied in [41]. In order to take advantage of the relative 
sparsity of the active constraints, we propose the following (exact) algorithm. 


Algorithm 4.2 (Steepest Subdifferential Descent). 

Initialization. Given b, u, the constant p > 0 and an initial point v°, compute 
the residual r° := b + rjAu — pv° and the projected residual z° := Pd< y e(F h°)))( r °) f or 
d(9(F q (v 0 ))) given by (4.10). 

For l — 0,1, 2,... 

• If z l — r l 

— setv = v l and STOP; 

• else 

— set v l+1 = v l + Xi ( z l — r ; ) where Xi > 0 is the largest constant A such that 
9 ( F q ( v l + A (z l — r 1 ))) = fi ( v l + A (z l — r 1 )) — q for i G I(v l ) with 

I{v)'={j \f j (v)-q = 9(F q {v))} ] (4.5) 

— compute r l+l := b + pAu — pv l+1 and the projected residual 

’=Pd(fi{F q {J + 1 )))( 7 ' ,+1 ); ( 4 - 6 ) 

— increment 1 = 1 + 1. 

Algorithm 4.2 is an active set method and the set I(v) defined by (4.5) is the set of 
active indexes at v. Another helpful interpretation is as a steepest subgradient descent 
method for solving 

argmin v (G(n) := 9 {F q (v)) - (b,v) + %\\Au - u|| 2 } . (4.7) 

The steepest descent step is 

v l+l = v l + A id 1 , 

for dj := Pdc(v) (0) = —r l + z l with z l := P dg r F J v i\\ ( r ' 1 ) and r l = b + p (Au — v l ). 
The choice of the step length A i ensures that, at each step l, the active set is growing; 
specifically, 

/ (V) C / (v l + Aid 1 ) . 
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At termination, the subdifferential 80 (F ? (V)) is large enough that it contains the 
residual r l . The terminal point of Algorithm 4.2, v, is a point in (4.7) since it satisfies 
the first-order optimality conditions: 


0 = z — b — rj (Au — v ) G 80 ( F q (y )) — b — y ( Au — v) = dG(v), (4.8) 

where z = P(dO(F q (v))) (b + y (Au — v)). Replacing u and b with u^ k,l+Vl and re¬ 
spectively yields the update for in Algorithm 4.1. 

The expression for the subcliffferential 86 (F q ) is particularly simple in this case. 
Note that I(v) ^ 0 for all v. Applying the (convex) calculus of subdifferentials to the 
objective 0(F e (v)), as permitted by the regularity of 0 and F (see, for instance [18, 
Section 2.3]), yields 

80 (F q (v)) = co {Vfj(v) | j G I(v )} , (4.9) 

where co denotes the convex hull of a set of points. This, of course, assumes that 
fj is differentiable at v for those j G I(v). Inspection of (4.1) shows that this is not 
the case in general, in particular at points v* where fj(v*) = 0. However, such points 
will never be in the active set I(v*) since /( v*) — q < 0 < 6 (F q (id)) for all q > 0, 
so we can safely apply formula (4.9) without further ado. This yields the following 
specialization for fj(v) = \(wj , v — y)\ given by (4.1): 


80(F q (v)) 


co {Vfj(v) | j G I(v) } (4.10) 

f co {{sign ((Wj, v - I /)) Wj I j G I(v ) \ {M + 1}} , 0} 0(F q (v)) < 0 

1 co {sign ((Wj, v - y)) Wj \ j G I(v) } 0( F q(v)) > 0. 


4.2 Synthetic data 

Fig. 1 shows a set of synthetic exact data u* G M n (shown in blue) and correspond¬ 
ing noisy data y G M. n (shown in green) with n = 512 data points, as well as the 
reconstructed/denoised signal u G M" (shown in red). In this example we consider 
only denoising, that is, the imaging operator A is the identity so v — u. The noisy 
data y was generated by adding i.i.d. Gaussian random noise with standard deviation 
cr = 0.05 to each original data point of u*. In our specialization of problem (V) we 
use the total variation penalty 

J(u) :=a||Vu|& (4.11) 

where V is the (discrete) gradient operator. The structured constraints are given by 
(4.2). The weights Wj G M n are scaled window functions of all intervals of lengths 
between 1 and 20 pixels, and X C {1, 2,..., M} is the index set corresponding to all 
collections of successive pixels in {1,2, ... ,n} of cardinality - or length - from 1 to 
20. The same error q is specified at all scales. 

For a signal length n = 512 with interval lengths from 1 to 20 the number of 
windows is M — 10050. The constant a is, strictly speaking, redundant but was 
introduced as an additional means to balance the contributions of the individual 
terms to make the most of limited numerical accuracy (double precision). We chose 
a = 0.01. The constant q was taken to be 2 a. 
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Figure 1: (a) Original, noisy and reconstructed data for a one-dimensional denoising 
problem, (b) Outer iterates k of Algorithm 4.1 showing solutions, the constraint 
violation, the active set size and the objective value for the penalized problem (V p ) for 
successively larger values of the penalty parameter p. (c) Inner iterates of Algorithm 
4.1 with p 5 = .032: step sizes, constraint violation, objective value and gap between 
the primary, domain-space variables u^ k,l \ 
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Figure 2: (a) Original data (STED image of Tubulin), (b) an enlargement of the 
indicated box to be processed. The length of scale bar in (a) is 1/jrri, the size of the 
reconstruction window (b) is 640 x 640 nm 2 . 


Figure 1(a) shows very good correspondence of the reconstructed signal to the 
original. The multi-resolution constraint prevents the usual “blocky” artifacts com¬ 
mon to image denoising with TV-regularization. The eventual (starting from around 
iteration 15) linear convergence of the algorithm can be seen in Figure 1(c). Un¬ 
der the assumption that the latter iterates are indeed in the region of local lin¬ 
ear convergence, the observed convergence rate is c = 0.9245, which yields an a 
posteriori upper bound on the distance of the 39th iterate to the true solution: 
||u 39 — u *|| < yiy||M 38 — w 39 || = 0.001244. Since the signal length is 512, this amounts 
to 5 digits of accuracy in the pointwise value of the signal. 

4.3 Laboratory data 

For our main demonstration, we are presented with an image y G M n (Figure 2(a)) 
generated from a Stimulated Emission Depletion (STED) microscopy experiment [35, 
38] conducted at the Laser-Laboratorium Gottingen examining tubulin, represented 
as the “object” u G M m . The imaging model is simple linear convolution, Au ~ y 
where A is a convolution matrix with a nonsymmetric experimentally measured point- 
spread function (290nm 2 ). The measurement y is noisy or otherwise inexact, and thus 
an exact solution Au = y is not desirable. Although the noise in such images is usually 
modeled by Poisson noise, a Gaussian noise model with constant variance suffices as 
the photon counts are of the order of 100 per pixel and do not vary significantly across 
the image. Figure 2(b) shows a close-up which we used as the noisy data i/gl 2 with 
n = 64 x 64 data points. We calculate the numerically reconstructed tubulin density 
u shown in Figure 3(a) via Algorithm 4.1 for the problem (V p ) with the qualitative 
objective 

J(u) := o:||u|| 2 . (4-12) 


22 










Figure 3: (a) Numerical reconstruction via Algorithm 4.1 from the imaging data 
shown in Figure 2 for p — 4096. (b) The reconstruction convolved with the measured 
PSF. At each resolution used for the reconstruction, the sum of the pixel values in 
(b) lie within a confidence interval of 3cr of those in Figure 2(b). 


For the image size n — 64 x 64 with the window system of squares of lengths 1 and 
2, the number of windows is M — 8065. The constant a in (4.12) is, strictly speaking, 
redundant but was introduced as an additional means to balance the contributions 
of the individual terms to make the most of limited numerical accuracy (double pre¬ 
cision). We chose a = 0.01. The constant q was chosen so that the model solution 
would be no more than 3 standard deviations from the noisy data on each interval of 
each scale. 

We emphasize that, since this is experimental data, there is no “truth” for com¬ 
parison - the constraint, together with the error bounds on the numerical solution to 
the model solution provide statistical guarantees on the numerical reconstruction [26] . 
The numerical “image” generated from the reconstructed tubulin density, u , is given 
by v = Au and is shown in Figure 3(b); this figure is a denoised version of the 
measured data shown in Figure 2(b). 

In Figure 4(a) a sample run of the algorithm shows a succession of outer iterations. 
The inner iteration is shown in Figure 4(b) with the value of pn = 4096 for which the 
constraints are exactly satisfied (to within machine precision), indicating the corre¬ 
spondence of the computed solution of problem (V p ) to a solution to the exact model 
problem (V). The eventual (starting from around iteration 1500) linear convergence 
of the algorithm can be seen in Figure 4(c). Under the assumption that the latter 
iterates are indeed in the region of local linear convergence, the observed convergence 
rate is c = 0.9997, which yields an a posteriori upper estimate of the pixelwise error 
of about 8.9062e -4 , or 3 digits of accuracy at each pixel. 
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i (and fc=ll) 

(b) 

Figure 4: (a) Outer iterates k of Algorithm 4.1 showing solutions, constraint violation, 
the value of the regularizer, the objective value for the penalized problem (V p ) and 
the active set size for successively larger values of the penalty parameter p. (b) Inner 
iterates of Algorithm 4.1 with pw = 4096: step sizes, constraint violation, objective 
value and gap between the auxiliary, image-space variables and the primary, 
domain-space variables u^ k,l \ 
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5 Concluding remarks 

We have focused our attention on the ADMM algorithm due partly to its prevalence 
in practice, and partly its amenability to our theoretical techniques. The parameter 
r) in Algorithm 2.1 was left constant. How to choose this parameter in the context of 
minimization is a perplexing question and worthy of further study. Our theoretical 
framework can also be adapted to Krasnoselski-Mann relaxations of the Douglas- 
Rachford algorithm. Statements about this will appear in work underway studying 
more generally averaged mappings. 

The statistical interpretation of the reconstruction in Figure 3(b) as described 
in [26, 27] opens the door to a quantitative approach to image processing, but this 
is only valid when one can estimate the distance of the numerical approximation to 
the exact solution to the underlying model optimization problem (V). Determining 
quantitative estimates for how close the numerical solution shown in Figure 3(a) is 
to an exact solution to problem (V) under the assumption of exact evaluation of the 
associated prox operators of has been the topic of our study. 

What is needed and largely missing in the current treatment of algorithms in the 
literature is a complete error analysis accounting for accumulated errors at each stage 
of algorithms - due to finite precision or finite termination of iterative procedures 
- together with statements about how close one can get to the solution to a given 
optimization problem, as opposed to its optimal value, the latter having in general 
no necessary connection to the former. This is a monumental project that has not 
received as much attention in the literature as studies of complexity based upon func¬ 
tion values. As we argued, the standard approach for handling inexact computation 
by assuming summable errors does not solve the problem, it just distributes it over 
infinitely many iterates. An alternative to this was suggested in [40, Section 6] and 
applied in [45] which allows a fixed error over all iterations without compromising 
local linear convergence. More work in this direction would narrow the gap between 
theory and practice. 


Appendix 

Duality of ADMM and the Douglas-Rachford Algorithm. Consider the sequence 
(b k ,v k ) k( _ N of the Douglas-Rachford iteration 2.8, for the case B := d(J* o (— A T ))\ 
D := dH*. Recalling the two-step implementation (2.13), denote p := b k — r)v k and 
p' ._ gk+ 1 . Then (2.13a) is the proximal step p' — (I + r)d(J* o (— A T )))~ 1 p on the 
operator B = d(J* o (— A T )). If A has full column rank, by [22, Proposition 3.32(iv)], 
this step can be performed by 

u k+l = argminj J{u) + (p + gv k , Au) + MAu — v k \\ 2 }; (5.1) 

U Z 

p' = p + gAu k+1 . (5.2) 

Indeed, since A has full rank, J{u) + (p + r]v k ,Au) + ^\\Au — v k \\ 2 is a proper strongly 
convex function of u and has a unique minimizer u k+l . From the optimality condition 
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for (5.1), 


0 G dJ{u k+1 ) + A T (p + 7jAu k+1 ) = dJ(u k+1 ) + A T p'. 

Hence, (u k+1 , — A T p') G gph<9J which implies (— A T p', u k+1 ) G gph<9J*. This gives 

yy (p',u k+1 ) G gph (dJ* o (—A T )) 
yy (p, —Au k+1 ) G gph (—A o dJ* o (— A t )) C gph 9 (J* o (— A t )) . 

Using (5.2), 

(/A \{p-p) e gph9(J* o (H t )) 

yyp' = (/+ 7?<9(J* o (A T )))~ 1 p. 

Substituting p = b k — ryo k in (5.1)-(5.2) yields 

u k+l = argminj J(u) + ( b k — pv k + pv k , An) + 3 ||Am — u fc || 2 }; (5.3) 

U Z 

q k+i = b k _ 7Jv k + 7]Au k+ l. ( 5 . 4 ) 

Similarly, if we denote p := q k+l + r]v k (= b k + r]Au k+1 ) and p' := b k+1 , (2.13b) 
is the proximal step p 1 — (I + pdH*)~ 1 p on the operator D = dH* which can be 
performed via 

v k+1 = argmin {H(v) — (p — pAu k+1 ,v) + ^\\Au k+1 — v|| 2 }; 

V z 

/ _ b _|_1 

P = p ~ TjV . 

Substituting p = b k + pAu k+1 , 

v k+l = argmin {H[y) — ( b k + pAu k+l — ?)Au k+1 ,v ) + ^\\Au k+1 — n|| 2 }; (5.5) 

V z 

b k+i _ b k + r]Au k +i -r) V k +1 . (5.6) 

Now, (5.3)-(5.4) and (5.5)-(5.6) together yield 

u k+1 = argminj J{u) + ( b k ,Au ) + %\\Au — u fc || 2 }; 

U z 

v k+1 = argmin {H(v) — (b k ,v) + ^|| Au k+1 — v|| 2 }; 

V Z 

b k+ 1 = b k + rj(Au k+1 - v k+1 ). 

This is the ADMM algorithm (2.6a)-(2.6c) for the primal problem ( V \). □ 
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